Visual Dictionary Learning for Joint Object Categorization and Segmentation
نویسندگان
چکیده
Representing objects using elements from a visual dictionary is widely used in object detection and categorization. Prior work on dictionary learning has shown improvements in the accuracy of object detection and categorization by learning discriminative dictionaries. However none of these dictionaries are learnt for joint object categorization and segmentation. Moreover, dictionary learning is often done separately from classifier training, which reduces the discriminative power of the model. In this paper, we formulate the semantic segmentation problem as a joint categorization, segmentation and dictionary learning problem. To that end, we propose a latent conditional random field (CRF) model in which the observed variables are pixel category labels and the latent variables are visual word assignments. The CRF energy consists of a bottom-up segmentation cost, a top-down bag of (latent) words categorization cost, and a dictionary learning cost. Together, these costs capture relationships between image features and visual words, relationships between visual words and object categories, and spatial relationships among visual words. The segmentation, categorization, and dictionary learning parameters are learnt jointly using latent structural SVMs, and the segmentation and visual words assignments are inferred jointly using energy minimization techniques. Experiments on the Graz02 and CamVid datasets demonstrate the performance of our approach.
منابع مشابه
Sparse Dictionaries for Semantic Segmentation
A popular trend in semantic segmentation is to use top-down object information to improve bottom-up segmentation. For instance, the classification scores of the Bag of Features (BoF) model for image classification have been used to build a top-down categorization cost in a Conditional Random Field (CRF) model for semantic segmentation. Recent work shows that discriminative sparse dictionary lea...
متن کاملJoint Dictionary and Classifier Learning for Categorization of Images Using a Max-margin Framework
The Bag-of-Visual-Words (BoVW) model is a popular approach for visual recognition. Used successfully in many different tasks, simplicity and good performance are the main reasons for its popularity. The central aspect of this model, the visual dictionary, is used to build mid-level representations based on low level image descriptors. Classifiers are then trained using these mid-level represent...
متن کاملDecomposed Learning for Joint Object Segmentation and Categorization
We present a learning algorithm for joint object segmentation and categorization that decomposes the original problem into two sub-tasks and admits their bidirectional interaction. In the first stage, in order to decompose output space, we train category-specific segmentation models to generate figure-ground hypotheses. In the second stage, by taking advantage of object figure-ground informatio...
متن کاملThe effects of segmentation and redundancy methods on cognitive load and vocabulary learning and comprehension of English lessons in a multimedia learning environment
The present study was conducted with the aim of the effects of segmentation and redundancy methods on cognitive load and vocabulary learning and comprehension of English lessons in a multimedia learning environment.The purpose of this study is an applied research and a real experimental study. The statistical population of the present study includes all people aged 14 to 16 who are enrolled in ...
متن کاملBoosting VLAD with Supervised Dictionary Learning and High-Order Statistics
Recent studies show that aggregating local descriptors into super vector yields effective representation for retrieval and classification tasks. A popular method along this line is vector of locally aggregated descriptors (VLAD), which aggregates the residuals between descriptors and visual words. However, original VLAD ignores high-order statistics of local descriptors and its dictionary may n...
متن کامل